Research questions:
Before conducting imputations, I excluded participants who said their sexual preferences were for “both” genders or the same gender (n = 76). I further excluded people who did not identify as black or coloured (n = 7) and people who did not report partners in the previous year (n = 170). Participants who had missing observations on those characteristics were left in the dataset. This left 1074 relationships reported by 647 participants. Of the 647 participants, 185 reported more than one relationship in the previous year. I imputed 50 datasets using the random forest method for continuous and nominal categorical variables and the “polr” method for our ordinal variables.
Here I used linear mixed effects models that were stratified by sex and contained a random intercept for the participant. For the plots, I randomly selected an imputed dataset to visualize the pattern in that population. I did not explicitly model heteroskedastic variance, like we did with the LNS data, because plots of the residuals did not seem to indicate increasing variance with age.
For some of the models on the some datasets (e.g. impution 9 in the HIV positive female sub-population), I recieved this error message when trying to obtain the CIs for the between-subject (intercept) SD: “cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance”. According to Jose Pinheiro, this “indicates that, although the optimization algorithm converged (according to the criteria defined in the ms() function), the Hessian matrix calculated at the converged values was not negative-definite and therefore an approximate covariance matrix for the MLE’s could not be obtained. This is generally caused by a flat log-likelihood surface, for which the algorithm decided that no further improvements were possible and declared convergence. This is an indication that the model may be overparameterized and that you should cut down in the number of parameters.”
In the power variance function coefficient plot, there were 20 imputations removed because “Non-positive definite approximate variance-covariance”.
| Table 1. Mean bridgewidth by age category and gender in randomly selected imputed dataset | |||||||||
| Male | Female | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Age category | n | Mean | SD | SEM | n | Mean | SD | SEM | |
| 15-24 | 37 | 3.38 | 4.54 | 0.75 | 82 | 3.26 | 10.33 | 1.14 | |
| 25-34 | 49 | 3.84 | 7.62 | 1.09 | 157 | 1.13 | 4.16 | 0.33 | |
| 35-44 | 42 | 1.67 | 3.97 | 0.61 | 120 | 1.40 | 5.37 | 0.49 | |
| 45-54 | 48 | 3.65 | 9.63 | 1.39 | 49 | 0.82 | 3.24 | 0.46 | |
| 55-70 | 33 | 3.64 | 9.17 | 1.60 | 29 | 4.34 | 12.15 | 2.26 | |
|
SD, Standard Deviation SEM, Standard Error of the Mean | |||||||||
Here I used generalised additive models with negative binomial regression to regress bridgewidths on HIV status of the participant before enterring our study. In these models, the participant is the unit of observation and only participants reporting more than one partner in the previous year were included. Separate models were created for men and women, as well as imputed datasets. The models adjust for age(smooth term) and race.
Here, I regressed a binary partnership-level concurrency indicator on bridgewidths using a generalized additive logistic regression model. Again, the participant was the unit of observation and only participants reporting more than one relationship in the previous year were included. Models were applied to different imputed datasets and stratified by gender. All models adjust for age(smooth term) and race. Bridgewidth was treated as a continuous linear term because exploration using GAMS indicated it should be.
The following plots, visualize models where we are interested in the effect of whether the HIV status of a participant was associated with them having a subsequent concurrent relationship (but in the year before the survey). In these models HIV was our exposure of interest, and bridgewidth a hypothesized mediator. Again these were adjusted for age(smooth term) and race.
I regressed a binary relationship-level condom use indicator on bridgewidths using a generalized additive mixed models with a logistic outcome and random intercept for participant. The relationship was the unit of observation and only relationships from participants reporting more than one relationship in the previous year were included. Models were applied to different imputed datasets and stratified by gender. All models adjust for age(smooth term) and race. Bridgewidth was treated as a continuous linear term because exploration using GAMS indicated it should be.
The following plots, visualize models where we are interested in the effect of whether the HIV status of a participant was associated with always using a condom in relationships. In these models HIV was our exposure of interest, and bridgewidth a hypothesized mediator. Again these were adjusted for age(smooth term) and race.
I regressed sex frequency on bridgewidth. Sex frequency is a relationship-level variable that represents the average number of times a participant had sex per week with that partner. I used generalized additive mixed models with a poisson outcome and random intercept for participant. The relationship was the unit of observation and only relationships from participants reporting more than one relationship in the previous year were included. Models were applied to different imputed datasets and stratified by gender. All models adjust for age(smooth term) and race. Bridgewidth was treated as a continuous linear term because exploration using GAMS indicated it should be.
The following plots, visualize models where we are interested in the effect of whether the HIV status of a participant was associated with sex frequency in relationships. In these models HIV was our exposure of interest, and bridgewidth a hypothesized mediator. Again these were adjusted for age(smooth term) and race.